5 research outputs found

    MEDEA: A Hybrid Shared-memory/Message-passing Multiprocessor NoC-based Architecture

    Get PDF
    The shared-memory model has been adopted, both for data exchange as well as synchronization using semaphores in almost every on-chip multiprocessor implementation, ranging from general purpose chip multiprocessors (CMPs) to domain specific multi-core graphics processing units (GPUs). Low-latency synchronization is desirable but is hard to achieve in practice due to the memory hierarchy. On the contrary, an explicit exchange of synchronization tokens among the processing elements through dedicated on-chip links would be beneficial for the overall system performance. In this paper we propose the Medea NoC-based framework, a hybrid shared-memory/message-passing approach. Medea has been modeled with a fast, cycle-accurate SystemC implementation enabling a fast system exploration varying several parameters like number and types of cores, cache size and policy and NoC features. In addition, every SystemC block has its RTL counterpart for physical implementation on FPGAs and ASICs. A parallel version of the Jacobi algorithm has been used as a test application to validate the metodology. Results confirm expectations about performance and effectiveness of system exploration and desig

    MEDEA: A Hybrid Shared-memory/Message-passing Multiprocessor NoC-based Architecture

    Get PDF
    The shared-memory model has been adopted, both for data exchange as well as synchronization using semaphores in almost every on-chip multiprocessor implementation, ranging from general purpose chip multiprocessors (CMPs) to domain specific multi-core graphics processing units (GPUs). Low-latency synchronization is desirable but is hard to achieve in practice due to the memory hierarchy. On the contrary, an explicit exchange of synchronization tokens among the processing elements through dedicated on-chip links would be beneficial for the overall system performance. In this paper we propose the Medea NoC-based framework, a hybrid shared-memory/message-passing approach. Medea has been modeled with a fast, cycle-accurate SystemC implementation enabling a fast system exploration varying several parameters like number and types of cores, cache size and policy and NoC features. In addition, every SystemC block has its RTL counterpart for physical implementation on FPGAs and ASICs. A parallel version of the Jacobi algorithm has been used as a test application to validate the metodology. Results confirm expectations about performance and effectiveness of system exploration and design

    Implementation Analysis of NoC: A MPSoC Trace-Driven Approach

    No full text
    This paper proposes to tackle Networks-on-chip design for MPSoC on the assumption that area and power overhead control is the primary goal. Analysis of topologies and routing strategies is performed by comparing two approaches, the wormhole and hot potato, both theoretically and using real synthesized data in 0.13ĂŽÂĽm technology. It is shown that the hot potato solution is competitive and possibly better for both occupation and dissipation, while its performance, measured by simulation on real multiprocessor traces, is not worse than the wormhole case

    The NoCRay Graphic Accelerator: a Case-study for MP-SoC Network-on-Chip Design Methodology

    No full text
    The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability of next-generation on-chip multiproces- sor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this work a complete design methodology is proposed, tackling at once the aspects of hardware architecture, programming model and design automation. The proposed design flow has been used in the implementation of a multiprocessor Network-on-Chip based system, the NoCRay graphic accelerator. The system uses 8 Tensilica LX processors and has been physi- cally implemented on a Xilinx Virtex-4 LX-160 FPGA reporting a 17.3M equivalent gate-count. Performance are compared with a commercial general purpose processor and show good results considering the low frequency of the prototyp

    The Role of Vascular Aging in Atherosclerotic Plaque Development and Vulnerability

    No full text
    corecore